Goto

Collaborating Authors

 closest hyperplane


Introduction to Adversarial Machine Learning

#artificialintelligence

Here we are in 2019, where we keep seeing State-Of-The-Art (from now on SOTA) classifiers getting published every day; some are proposing entire new architectures, some are proposing tweaks that are needed to train a classifier more accurately. To keep things simple, let's talk about simple image classifiers, which have come a long way from GoogleLeNet to AmoebaNet-A, giving 83% (top-1) accuracy on ImageNet. If we were to take an image and change a few pixels on it (not randomly), what looks the same to the human eye can cause the SOTA classifiers to fail miserably! I have a few benchmarks here. You can see how miserably these classifiers fail even with the simplest perturbations. This is an alarming situation in the Machine Learning community, especially as we move closer and closer to adopt the use of these SOTA models in real world applications. Let's discuss a few real-life examples to help understand the seriousness of the situation. Tesla has come a long way, and many self-driving car companies are trying to keep pace with them. Recently, however, it was seen that SOTA models used by Tesla can be fooled by putting simple stickers (adversarial patches) on the road, which the car interprets as the lane diverging, causing it to drive into oncoming traffic. The severity of this situation is very much underestimated even by Elon (CEO of Tesla) himself, while I believe Andrej Karpathy (Head of AI, Tesla) is quite aware of how dangerous the situation is. This thread from Jeremy (Co-Founder of Fast.ai) says it all. In this clip, @elonmusk tells @lexfridman that adversarial examples are trivially easily fixed.@karpathy is that your experience at @tesla? @catherineols is that what the neurips adversarial challenge found? A recently released paper showed that a stop sign manipulated with adversarial patches caused the SOTA model to begin "thinking" that it was a speed limit sign. This sounds scary, doesn't it? Not to mention that these attacks can be used to make the networks predict whatever the attackers want! Imagine an attacker who manipulates road signs in a way such that self-driving cars will break traffic rules.


DeepFool -- A simple and accurate method to fool deep Neural Networks.

#artificialintelligence

Let's go over the Algorithm: 1. The algorithm takes an input x and a classifier f . And the loop variable to 1. 4. Start and continue loop while the true label and the label of the adversarially perturbed image is the same. 5. Calculate the projection of the input onto the closest hyperplane. With multiclass classifiers, let's say the input is x and for each class there is a hyperplane (straight plane that divides one class from the others) and based on the place in the space where x lies it is classified into a class. Now, all this algorithm does is, it finds the closest hyperplane, and then projects x onto that hyperplane and pushes it a bit beyond, thus misclassifying it with the minimal perturbation possible.